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ABSTRACT 


This paper presents a survey and comparison of some computational 
methods and algorithms for gamma and log-gamma functions of com- 
plex arguments. All these methods and algorithms are reported 
recently in the open literature and include Chebyshev approximations, 

Pade^ expansion and Stirling's asymptotic series. The comparison i 
leads to the conclusion that Algorithm 421 published in the Communications 
of ACM by H. Kuki is the best program either for individual application or 
for the inclusion in subroutine libraries. 
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Introduction 


In the past two years there appear in the open literature a number of 
papers on the computation of gamma or log -gamma functions of complex 
arguments: r(z) and jtnr(z). In particular, there are two published 
algorithms, A404 and A421 [3 and 2], there is Luke 1 s analysis 
published in a SIAM Journal [ 4 ], there are Spira's study [ 5 ] and Cody's 
approximations [ 1 ]*, both reported in the Mathematics of Computation. The last 
approximations apply only to special cases where the argument values lie 
on straight lines parallel to the imaginary axis. In this article we attempt 
to compare and discuss the methods or algorithms given in these papers. 

We hope that such investigation may provide several useful functions. 

First, it surveys the recent activities in this area of computational 
mathematics. Second, it provides information on what good methods to 
use in computing this function and thereby eliminating the poor ones. Third, 
it helps to bring out a high-quality algorithm to be recommended either 
for individual use or for the inclusion in program libraries. Last, but 
not least, we hope this study may contribute some ideas to the methods and 
processes of evaluation of mathematical software. 

II. Computational Methods 

For the five papers mentioned we find three distinct methods proposed, 
viz. , Chebyshev rational approximation, asymptotic expansion and Pade 
approximations. All these methods are applicable to arguments confined 
in some segments of the complex plane. An algorithm using any of these 
methods is therefore dependent on some analytic continuation to cover all four 

‘‘p 0 r conciseness, papers of multiple authorship will be referred to by the 
first author. The references will give full citations. 
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quadrants of the complex plane. For the function in question, we have simple 
formulae for such purpose, but even these simple formulae have to be 
implemented with caution in order to increase efficiency and enhance 
accuracy, as we shall discuss subsequently. We find it appropriate here to 
list these formulae for later references in this article. They are, respec- 
tively, well-known formulae for recursion, reflection, conjugation and 
duplication of arguments (Cf. [7 ]): 


n- 1 

r(z+n) = r{z) [1 (z+k) (1) 

k=0 


T(z)r(l-z) = TTcosecTTz, 


(2) 


r(z) = r(z), 


( 3 ) 


r(2z) = 2 2z_1 r(z)r(z+ £ )//n“ 


( 4 ) 


1. Chebyshev Rational Approximations 

First we turn to Cody's approximations, which are applicable to special 
cases of this function and should not be directly compared with the other 
investigations considered here. In his investigation, Cody's main 
concern is to provide minimax approximation for the Colomb phase shift, 
which occurs in the asymptotic behavior of the Coulomb wave functions. 
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and. is defined as 


<r L (11)= ImUnT(L+l+iTl)} 


(5) 


where L is an integer. The zeroth order phase shift is represented 
by 3 sets of rational approximations of the following form: 


o- 0 (Tl) ^ T|(Tl 2 -Tlg)R 1 (jfcm;71 2 ), 0*71*2.0, (6) 

~TlR 2 (j!m;Tl 2 ), 2. 0 * T| <: 4. 0, (7) 

i ar ctan(T|) + T|[ a in( 1 +T| 2 ) + R^(fm; 1 /7| )] 4.0*7], (8) 


where 7)^ is the positive nontrivial zero of <Tq(T|), and the R's are 
rational functions of degree i in the numerator and m in the 
denominator. We note here that the asymptotic approximation as 
expressed in equation (8) can be substantially improved for efficiency. 
To this end we may use 
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from 


<r 0 “ <r_ ^ + tt/2, 


£nr(l+iT() = ( j£nT| + in/2) + 4nr(iT]), 

and an approximation of the form 

<t_j = ImXnr{ill) ^ Tl£nT!-Tl-TT/4-i 3 {^m;l/Tl 2 ) ( 8 a) 

In other words, the arctangent function may be alleviated in equation 

( 8 ). Since it constitutes almost 1/2 of the cost in that equation, this 

amounts to a substantial savings. The difference in efficiency between 

and should be insignificant because these two rational functions 

are approximating respectively the asymptotic behaviors [ 1 /(m +1)] n 
2 “ n 

and [ 1 /T] 1 ' ] , which are almost identical for large T|. 

Cody further suggests that higher order phase shifts may be computed 
from the identity 

L 

= G'gH) + arctanCH/j), ( 9 ) 

j=l 

which comes from the recursion formula ( 1 ). 

In the context of our general discussion in this paper, the main 
application of Cody’s approximations will be for the computation of the 
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gamma function of pure imaginary arguments. Let 


jfcnr(iy) = U + iV. 


( 10 ) 


We then have (also, cf. [ 7 ])> 


and 


X I 

U = 2 i.mr - % to(ysinhny). 


V = «r 0 (y) - tt/2. 


( 11 ) 


( 12 ) 


Cody's approximations for (r^y) are very efficient. For example, for 
a relative truncation error less than 10 , one needs only 4-4 rational 

functions for y ^ 4. 0 and 2-2 rational functions for y > 4.0. 

2. Stirling's Approximation 

We now turn to the use of Stirling's approximation - the subject of the 
articles by Kuki, Lucas and Spira. Of the two well-known versions of 
this approximation, all the authors mentioned have chosen the more 
efficient form, viz. , that for log-gamma 

N 

j£nr(z) = (z- i )£nz - z + ± in(2n) + £ B^/E 2k(2k- 1 )z 2k ~ 1 ] + T N (z) (13) 

k= 1 


where is the truncation error term and B,. are Bernouli numbers. 

N 

There have been reported in the literature several bounds and estimates 
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for the quantity T^. Spira has summarized some of these. We give 
here an updated review of these error analyses and applications. 

(i) Whittaker and Watson [ 8 ] 



^ | B 2N+2 I K ( z ) 

(2N+l)(2N+2) |z| 2N+1 


(14) 


where 


K(z) = upper bound | z^ /(u 2 -t-z 2 ) j , u ^ 0 


(ii) Nielsen [6 ] 


B 


2N+2 1 


-N 1 


(2N+l){2N+2) z 


2N+1 


[ cos( A arg z) ] 


2N+2’ 


|arg z| <tt (15) 


(iii) Spira [5 ] 

|T n | * 2 |b 2n /( 2N-1)|* 1 Im z | ^ ~ , Re z <0, Im z 4 0, 

l T N l * |B 2 n /(2N-1)|- |z| 1_2N , Rez s 0. 

(iv) Lucas and Terrill [3] 


( 16 ) 


|ReT N | < |ReS N+1 |, |lmT N l < jlmS N+1 |, |arg z| * n/4 


(17) 


where is the (N+l)st term in the asymptotic series (13). 

This is actually derived from the bound (14). 
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(v) Kuki [ 2 ] 


| Tj^j s 6, Re z s max{a(€ ), min[b{€), c(€) - | Imz | ]} > 0, (18) 

where a, b and c are constants dependent on a given € and are 

derived from the condition (14). 

The choice of a proper truncation error control is extremely important 
because such choice determines for a desired accuracy the region of 
applicability of Stirling's approximation which in turn affects the efficiency 
and final accuracy rendered by an algorithm. Of the five types of error bounds 
described above, we believe that in general Nielsen's formula is too ex- 
pensive in the requirement of computing a cosine and an arctangent and 
Lucas' formula too inefficient in its exclusion of one-half of the complex 
plane for applicability. Kuki's truncation error control is realistic and 
most efficient, but suffers from serious inflexibility due to the require 1 ' 
ment that the boundary curve need be derived for each different precision 
desired. All in all, we believe that Spira's error bound is a reasonable 
compromise choice for a general algorithm for the complex gamma 
function. It is simple to use and is fairly efficient in its permission of 
the applicability of Stirling's approximation in a large segment of the 
complex plane, thereby minimizing the use of recursion. 

With the proper boundary curve provided by a particular truncation error 
control, Stirling's approximation must be used in combination with some 
or all of the analytic continuation properties given in equations (1) - (4). 
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The following figures show the implementation {or the proposed 
implementation in Spira's case) of the various authors being reviewed. 
For example, in Lucas' implementation, given an argument z on the left 
half of the complex plane formula (2) is used to reflect the computation 
to the right half. Then, if necessary, formula (1) is used to raise the 
argument such that (13) may be applied. Similar remarks apply to the 
other two cases. 
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3 . Pade Approximations 

We next turn to the Pad/ approximation used by Luke [ 4 ]• These 
approximations are derived from those for two incomplete gamma 
functions, viz., T(z) - y(z, 0 ') + F(z,Qf), where a maybe considered a 
free parameter in the context of the present application. The 
approximant takes the following form. 

C (z) G (z) 

r(z+1) = dVt + iTTzT + L n <z) + u m <z >' R l z ) 20 <‘9> 

n m 

where C , D , G and H are constituted of hyper geometric functions 
n n in m ° 

of the type F (ap • *a , b.-'-b ; t), and L and U are truncation 
P <1 1 p 1 q n m 

errors of the two Pade rational functions. The approximants may be 
computed by recurrence relations. Both and satisfy the 
following recurrence, 

(2n-tz) = [ (2n+z)(2n+z + 2) ~ az ]{2n+z+l) C n + n(n+z)(2n+z+2) ^ (20) 

C 0 = a Z e~ a , Cj = a Z e~ a L a+(z+l )(z+2) ]; 

D 0 ~ l> D 1 = ( z+ 1 M z+2_Q! ) 

Similarly both G m and satisfy the following, 

G m+1 = («+2m+2-z)G m - m{m+l-z)G m _ 1 (21) 

Gq = a Z e a z, G^ = a ? e a z(a+l); 

Hq = a, Hi = a(2+a-z). 
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Luke has further provided very realistic estimates on the truncation error 

terms L and U <, For our purpose it suffices to record here selected 
n m 

numerical values for the sake of comparison, 

TABLE 1 

Numerical Estimates of Relative Truncation Errors in Luke's Pade Approximants 


| L n (z, a)/r(z+l) | , z = x+i y, a - 8 


n 

x/y 

0 

2 

4 

6 

8 

10 

11 

0. 5 

. 23(-07) 

. 21(-07) 

. 19(-07) 

. 32(-07) 

. 54(-07) 

. 25(-07) 

16 

0. 5 

. 61(- 16) 

. 56(- 16) 

. 55( - 1 6) 

. 66(-16) 

. 1 4( — 15) 

. 1 8( — 1 5 ) 

10 

2.5 

. 1 3 ( — 07 ) 

. 1 2(-07) 

. 12(-07) 

. 18(-07) 

. 28(-07) 

. 1 3( - 07 ) 

15 

2. 5 

. 41 ( — 1 6) 

. 40(-l6) 

. 3 9 ( - 16) 

. 45( — 1 6) 

. 89{- 16) 

• 1 1(" 1 5) 

9 

4. 5 

. 88(-08) 

. 86(-08) 

. 84(-08) 

. 1 2(-07) 

. 1 7( — 07) 

. 83{— 08) 

14 

4. 5 

. 3 1 ( — 1 6 ) 

. 31(-16) 

. 30(- 16) 

. 35(- 1 6) 

. 64(- 1 6) 

.79{-l6) 




| u m ( z > a)/f(z+l)| , z 

= x+i y, a = 

8 


n 

x/ y 

0 

2 

4 

6 

8 

10 

11 

0. 5 

. 98(- 17) 

. 20(- 14) 

. 52( - 1 2) 

. 84( -10) 

. 89(-08) 

• 65( - 06) 

16 

0. 5 

. 63(-20) 

. 1 4( - 17) 

. 41 ( - 1 5) 

.82(-13) 

. 1 2( — 10) 

. ll(-08) 

20 

0. 5 

. 37(-22) 

. 82(-20) 

. 26( - 17) 

. 59( - 1 5) 

. 9 6( - 1 3 ) 

. 11{“10) 

11 

2. 5 

. 57{-16) 

. 11(-13) 

. 26(-l 1) 

. 35(-09) 

. 29(-07) 

. 16(-05) 

17 

2. 5 

. 69(- 20) 

. 1 5 ( - 17) 

. 42(- 1 5) 

. 81(-13) 

. 10(-10) 

> 93(-09) 

20 

2. 5 

. 1 4( — 21) 

. 30(- 19) 

.93(-17) 

. 20(- 14) 

. 29(~ 1 2) 

. 30(- 1 0) 

11 

4. 5 

. 59(-15) 

.11(-12) 

. 22(~ 1 0) 

. 23(-08) 

. 14{-06) 

.59{-05) 

17 

4. 5 

. 44(- 19) 

.93{-17) 

, 25(-14) 

. 42(- 1 2) 

. 47(- 10) 

. 35(-08) 

20 

4. 5 

. 75(-21) 

. 1 6(— 1 8) 

. 47(- 1 6) 

. 92(-14) 

. 1 2(- 11) 

. ll(-09) 
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III. Discussions and Comparison of Methods 

In the consideration of various competing methods as candidates for 
algorithmic implementation, we believe the following points must be 
examined. 

1 . Truncation Errors in the Approximations 

(a) The choice of relative versus absolute error criterion must be 
determined. Kuki and Spira both use an absolute criterion. This 
is quite consistent in the particular application in both cases, where 
the log-gamma function is being approximated in a region where the 
modulus of the function values typically range in [10, 50]. ( In such 
region the absolute error serves as an upper bound to the relative 
error. Since the approximation can be made quite efficient by the 
proper choice of region (e.g. , a 7** 1 degree polynomial for a 10 ^ 
accuracy), the absolute criterion is satisfied at a low cost. On 
the other hand, if the gamma function is to be approximated, then 
relative criterion must be imposed, because the function values 
may be arbitrarily large. We find Luke’s usage of this latter 
criterion also consistent with his application. Lucas' usage of 
relative criterion is somewhat obscured by his error control on 
the components of the function values, which we discuss next. 

{b)For complex-valued functions, the choice between component 
and modulus accuracy is dependent on the particular application 
in question. Therefore for a general-purpose algorithm only 
some rather general arguments may be advanced to favor one over 
the other. In the present context, Kuki has proposed strong argument 
in favor of modulus accuracy: "Since an analytic function maps the 
complex plane locally formally, it maps a circular blur about the 
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correct answer. This means that the concept of vector (or 
modulus) error is the natural one to use for compounding errors 
through successive computational steps." Probably for the same 
reason Luke and Spira have also chosen the modulus accuracy 
criterion. There is an additional reason to favor the choice of 
modulus accuracy in the applications of Kuki and Spira. In these 
cases where absolute accuracy are desired, the modulus of the 
error is an upper bound to the components. 

(c) There is also the question of fixed versus variable precision 
control. In the former case a fixed order approximation is 
predetermined based on an error bound for the worst case. An 
example is Kuki's algorithm where he retains seven terms through 
out the region of application of Stirling's approximation. The 
advantages include the elimination of extra storage and of testing, 
whereas the disadvantage lies in the obvious expense of an 
attempt to satisfy the most pessimistic error bound. Variable 
precision control is imposed through the computation of a 
sequence of approximants until the difference between two con- 
secutive ones is less than some desired tolerance. This type of 
control, as found in the investigations of Luke and Lucas, seems 
to utilize an error bound or estimate optimally, but an expense 
is paid in terms of the testing needed. We believe that in general, 
if the order of approximation for a desired accuracy is a rapidly 
varying function of the argument, this type of control should be 
used, because the most pessimistic bound may incur high cost 
in the use of the approximation. Such consideration applies to 
Luke's method in which the use of variable precision control is, 
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in our opinion, well justified. On the other hand, Lucas 1 usage 
of such control is not as mandatory. In discussing the merit of 
his method relative to Luke’s, Kuki concluded that M this comparison 
presents an example of the high cost of variable precision 
programming". Whereas we agree with Kuki that Luke’s method 
is expensive, we disagree with his contention that the high cost 
is due to variable precision control. In a subsequent paragraph 
we shall discuss the counts of arithmetic operations and shall 
suggest that the cost of Luke’s method comes from the arithmetic 
requirements intrinsic in his approach, 

2. Round-Off and Cancellation Errors 

In addition to truncation, there are two other sources of errors 
in the use of Stirling’s approximation. There is the accumulation 
of round-off errors in the summation of the asymptotic series. 

This summation may be readily arranged as a polynomial of 
fairly low degree. For example, in Kuki's case it is a third and 
seventh degree polynomials for single and double precision, 
respectively; in Lucas' case, since this approximation is applied 
in a region further removed from the origin, the required degree 
is certainly lower than 7. This low-degree polynomial of real 
coefficients is well-conditioned except near a zero where 
significance is lost in the evaluation of the polynomial. But since 
the application of this approximation is limited to large values of 
|z | (say, | z j in [ 10, 3 0 3) > the other terms in the expression (13) 
are also large in magnitudes. Hence full significance is not 
needed for the polynomial which is added to the other terms as a 
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correction. All in all, there is no serious build-up of round-off 

* 

errors in the evaluation of equation (13) and our observation here, 
viewed from an entirely different perspective, is in agreement 
with Kuki’s conclusion. A second source of error, through can- 
cellation, is much more serious. This comes about in the use of 
the logarithmic version of equation (1), in order to be consistent 
with equation (13) and to avoid premature overflow. In evaluating 
this expression, fnr(z) is computed by subtracting a sum from 
j0nr(z+n), which is substantially larger than lnT{z), thereby contri- 
buting the cancellation error. To deal with this problem, Kuki 
employs an excellent maneuver by combining equations (1) and 
(13) and analytically eliminate such cancellation. On the other 
hand, Lucas has not attended to this problem which is certainly 
an important cause of poor relative accuracy reported by him 
and confirmed by the present author. 

In Luke's method, the main computational scheme involves four 

recurrences (20) and (21). Therefore the major concern about 

round-off errors centers on the numerical stability of applying 

these recurrences in a forward direction. Luke provides some 

arguments to contend that such procedure is stable. In particular 

his contention is based on the fact that both E^( z, o) /C^(z, oi) and 

E (z, a)/D (z, a) go to zero rapidly as n ■+ ® where E (z, a) 
n n n 

= D (z,a)/L (z, a), and similarly for and H . We believe 
n n m 111 

more detailed analysis will be needed on this point if Luke's 
method is to be applied. Wimp's recent theoretical work 
[ 9 3 will be helpful in this regard. 
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3. Multiple Values of Log -Gamma 


In the course of computing log-gamma, any use of the complex 
logarithm must be carried out with care to avoid the extraneous 
addition or subtraction of a multiple of 2rr to the imaginary part 
of the result. In Lucas' algorithm, log-gamma is only computed 
as intermediate result to obtain gamma. Therefore any extraneous 
quantity 2krri will not affect the final answer because exp(2krri) = 1. 

In Luke's method, the process is exactly reversed. Here one is 
concerned with the proper way of taking the logarithm of a complex 
result. Analysis by Luke leads to the following result. 

Let 

T(z+1) = K + iL = re 1 ^ +ZTTk ^, (22) 

where cp = tan ^L/K), 0 £ tp £ 2fi, 

k = 0, 1, 2, ... , 

then k may be determined by the following. 

| k-A | < 2 ttA = 0 sin5(4nP-l)+S{p C os § - 1 /2)-sin(?)/(12g)-cp (23) 

where z+1 = ge^. 


In Kuki's approach, precaution is first taken to insure that the 
function computed is continuous in the first quadrant of the 
complex plane. Then the term log sin{TT Z ) as used in the reflection 
formula, is analyzed for analytic continuity and reduction of 
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round-off errors. In the end the reflection formula is implemented 
in the following form 

jfoiT(z) = G(z) - /nH(z) - Anf(l-z) (24) 


where 


G(z) = £n(2rr) + try + in( 1 /2 - x), 

H(z) = (l-e^ 17 ^) + e^ 71 ^ 2 sin^(trx) + i sin(2rrx)]. 


Now inH(z) can be replaced by its principal value for the following 
reason. Since |e | = e^ 17 ^ S 1 if y <• 0, and H(z) = 0 only if 

y = 0 and x is an integer, it means that as a parameter £ varies 
continuously in the lower half of the complex plane, H(£) follows 
a path entirely contained in the circle of radius 1 with the center at 
1 + oi. Therefore the principal value of logH(£) varies continuously 
along the path. 

All in all, we believe that this problem has been treated by all the 
authors carefully and accurately and where it is ignored, 
it is done so with justification. However, equation (23) is rather 
expensive to implement and in our opinion it is desirable to 
simplify this proposed procedure if possible. 
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4, Counts of Arithmetic Operations and External Function Calls 

The number of arithmetic operations for a particular method can be 
estimated a priori. Such estimate provides an indication about the 
relative expense of a method, though the corresponding algorithm 
may still vary substantially depending on how it is implemented. 

For our purpose here we shall express all operations in terms of two 
basic units, viz. , A for real floating point addition and M for real 
floating multiplication. For example, each complex addition will 
be assigned 2A, complex multiplication be given 4M+2A 
division be approximated by 10M+2A, and real division be approx- 
imated by 3M. In order to establish some common ground for 

comparison, we shall consider two target precisions, say, short 
“7 “16 

and long: 10 and 10 , respectively. We shall consider the 

estimates of operations needed for each method to attain a truncation 
error less than these tolerances. Our attention is first drawn to 
the right half of the complex plane. Except for Kuki's all methods 
are symmetric with respect to the real axis, as far as number of 
arithmetic operations are concerned. In Kuki’s case, since the 
conjugation relation is used for arguments in the fourth quadrant, 
it is required only an extra addition, which is of course negligible 
in our kind of estimate. Therefore we can further restrict our 
operation counts to arguments in the first quadrant. Table 3 gives 
the results of such operation counts based on the truncation error 
bounds or estimates provided by each author. We emphasize that 
these counts represent gross estimates only and that numerous 
machine-dependent factors can affect the performance of a 
particular method. For example, two such factors are whether 
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complex arithmetic operations are coded on line in a compiler or 
floating division should be approximated as 3M. Nevertheless, 
even with such uncertainties, our estimates do provide an indication 
of the relative expense of each method. In addition to arithmetic 
operations, attention should be focused on the number and types 
of external functions used. Table 2 provides (in abbreviated 
notations) a summary of functions required: complex logarithm 

(CLOG), complex exponential (CEXP), real logarithm (LOG), real 
exponential (EXP), real inverse tangent (ATAN), and real 
hyperbolic sine (SINH). 

TABLE 2 

Elementary Function Calls Required on the Application of Various Methods For 

Re z s 0 



GAMMA | 

LOG-GAMMA 

Chebyshev Rational 

CEXP + ATAN + 2LOG + SINH 

7 1 T 

sa 2 6 L 

ATAN + 2LOG + SINH 

Stirling Asymptotic 

CEXP + CLOG « 2L 

CLOG » L 

pade Approximation 

CEXP « L 

CEXP + CLOG « 2L 


In order to reduce our comparison to a common basis for discussion, 
we shall consider a unit L which is defined to be the amount of com- 
putation required for the complex logarithm. In a typical complex 
logarithm about two -thirds of the work is due to the computation 
of an inverse tangent and a real logarithm. Therefore the latter 
combination is given (2/3)L in Table 2. Moreover, the computational 
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effort for the complex exponential may be approximated by L. Thus 
in Table 2 we also summarize the units of L required in each type 
of methods considered. However, the units of L given result from 
an optimal counting for each method, which in actual implementation 
may substantially exceed the count given. For example, where the 
logarithmic version of eq. (1) is used, at least one additional call to 
the complex logarithm is required, and where eq. (1) is not used 
optimally, many such calls are invoked (as in Lucas' case). In 
Table 3 we record for each author, together with the total number 
of arithmetic operations, all the additional units of L, either due to 
the need of a functional equation or due to a non-optimal arrange- 
ment. Some comments apply to the individual columns appearing 
in Table 3. For small arguments in Cody's method, no function 
calls are required and therefore the proper fraction of L is 
subtracted. Operation counts for Stirling's approximation are 
obtained in a straightforward manner. A cursory examination of 
eq. (13) shows the minimal requirement of (N+l) complex multi- 
plications, (N+2) complex and 2 real additions. The counting for 
Spira's method is based on a variable-precision error control 
which favors the method somewhat because logical operations are 
not included in the total. For similar reason our counts are 
slightly biased against Kuki's implementation because some 
arithmetic operations are incorporated for the sake of reducing 
round-off errors which are not attended to in the other imple- 
mentations. In the case of Luke's approximation we have computed 
the truncation error estimates for the three rays argz = €, rr/4 and 
tt/2, where € is a small number of the order 10 We use 6 
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because the approximation becomes exact for certain argument 
values on the real axis. Operation counts for Luke’s approximation 
have been carried out by Kuki for a typical set of arguments. Our 
counts differ from Kuki's in yielding more optimistic results. The 
major difference comes from the method of estimating complex 
arithmetic operations. For example, the multiplication of a 
complex number by a real one is considered equivalent to two real 
multiplications rather than one full complex multiplication. In 
some operating systems, the latter may actually be applied. Thus 
our operation counts for Luke's case result in Op(C) = 14A + 2CA 
+ 7M + 6CM + 1 CD and Op(G) = 5A 4- 1CA + 2M + 2CM for the 
two recurrences (20) and (21), where CA stands for complex 
addition, etc. Expressing all operations in units of A and M and 
applying to the approximation (19) for C, D, G and H, we find 
thatLuke T s method requires 2n( 34M + 30A) + 2m( 1 0M + 11A), where 
n and m are dependent on the target precisions. It is apparent 
from Luke's analysis that his method is prohibitively expensive 
for large ]z|. In fact he suggested the usage of Stirling's approx- 
imation in such cases. The error estimates he provided are 
restricted to |z| < 11. We have computed these estimates beyond 
|z[ = 15 and have found that it becomes impractical to apply his 
method in those cases. In Table 3 this fact is noted and in 
general no meaningful operation counts can be provided for jz| > 15. 

A survey of Table 3 provides some meaningful comparison among 
the various methods or implementations. For purely imaginary 
arguments, application of Cody's approximations render the most 
efficiency. This is not surprising because these are essentially 
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best approximations for a particular function of one real variable. 
On the other hand, for large arguments there is room for a 
significant improvement in these approximations, as noted before. 
For an overall coverage of arguments in the entire complex plane, 
Kuki's and Spira's analyses definitely render the best approach in 
efficiency. Whereas Kuki's arrangement is superior in numerical 
stability, Spira's error control excells in simplicity and flexibility. 
Table 3 also shows inferior results for Lucas and Luke, due to 
different reasons. Luke's method is intrinsically expensive. It 
involves (2n + 2m) recurrences which contribute a large number of 
arithmetic operations. Even the saving of one function call for 
some arguments is far outweighed by the large number of arithmetic 
operations. Inefficiency in Lucas' application comes from three 
main sources. First, he applies eq. (1) in a non-optimal way, 
requiring n calls to the complex logarithms. Second, his truncation 
error bound is too restrictive, posing severe confinement in the 
application of Stirling's approximation. Third, his numerical data 
are not arranged carefully to avoid unnecessary overhead in 
arithmetic operations. For example, the eleven divisions involved 
in computing the coefficients C(J) are entirely superfluous. 

So far we have only considered arithmetic operations for argument 
values on the right half complex plane. For those on the left half 
all authors recommend the use of the reflection formula (2), which 
involves a complex sine and, for log-gamma, a complex logarithm. 
Therefore it would be useful to alleviate this use for at least some 
arguments. Spira's analysis is a contribution in this respect. 

From eq. (16) and Fig. 1 (iii) we see how the use of (2) can be 
minimized. 
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TABLE 3 


Counts of arithmetic operations and function calls in addition to those recorded in Table 2, 
Target precisions are 10 ^ for short and 10 ^ for long. 


Cody 


Kuki 


Lucas 


Luke 


Spira 


Ul 

argz 

Short 

Long 

Short 

Long 

Short 

Long 

Short 

Long 

Short 

Long 

i 

0 

\ 


32M+32A+L 

68M+64A+L 

45M+22A+9L 

61M+25A+9L 

768M+682A 

1356M+1240A 

40M+39A+L 

80M+75A+L 

5 

(or € in 

\ 


16M+16A 

52M+48A+L 

45M+1 8A+5L 

61M+25A+5L 

652M+584A 

1260M+ 1 164A 

20M+20A 

60M+56A+L 

10 

Luke 1 8 
case) 

\ 


16M+16A 

32M+28A 

45M+13A 

61M+25A 

568M+536A 

1 156M+1094A 

16M+16A 

36M+31A 

15 


\ 


16M+16A 

32M+28A 

45M+13A 

57M+22A 

396M+406A 

1100M+1062A 

16M+16A 

32M+28A 

20 




16M+16A 

32M+28A 

45M+13A 

61M+25A 

516M+538A 

* 

16M+16A 

28M+25A 

25 


\T 


16M+16A 

32M+28A 

45M+13A 

61M+25A 

644M+572A 

* 

16M+16A 

28M+25A 

30 



16M+16A 

32M+28A 

45M+13A 

57M+22A 

684M+612A 

* 

16M+16A 

28M+25A 

1 

Tl/4 


Y 

32M+32A+L 

68M+64A+L 

45M+27A+10L 

61M+29A+9L 

788M+704A 

1416M+1306A 

40M+39A+L 

80M+75A+L 

5 




16M+16A 

52M+48A+L 

45M+23A+6L 

61M+29A+5L 

704M+656A 

1392M+1324A 

20M+20A 

60M+56A+L 

10 



\ 

16M+16A 

3ZM+26A 

45M+18A+L 

61M+29A 

788M+778A 

1416M+1 380A 

16M+16A 

36M+31A 

15 



\ 

16M+16A 

32M+28A 

45M+18A+L 

57M+26A 

* 

* 

16M+16A 

32M+28A 

20 



\ 

16M+16A 

32M+28A 

45M+18A+L 

61M+29A 

* 

* 

16M+16A 

28M+25A 

25 



\ 

16M+16A 

32M+28A 

45M+18A+L 

61M+29A 

* 

* 

16M+16A 

28M+25A 

30 



\ 

16M+16A 

32M+28A 

45M+1BA+L 

57M+26A 

♦ 

$ 

16M+16A 

20M+25A 

1 

tt/2 

13M+8A-|l 

24M+19A -^L 

36M+36A+L 

72M+68A+L 

45M+23A+10L 

61M+29A+10L 

808M+726A 

I416M+1306A 

40M+39A+L 

80M+75A+L 

5 


12M+5A+^L 

18M+11A+^L 

28M+28A+L 

72M+68A+L 

45M+23A+10L 

61 M+29A+5L 

868M+792A 

1576M+1482A 

20M+20A 

60M+56A+L 

10 


12M+5A+^L 

lBM+llA+^L 

20M+20A 

52M+48A+L 

45M+23A+10L 

61M+29A+10L 

1008M+946A 

1756M+1680A 

16M+16A 

36M+31A 

15 


12M+5A+^L, 

IBM+llA+^L 

20M+20A 

36M+32A 

45M+28A+1 5L 

57M+26A+1 5L 

1 168M+1 122A 

* 

16M+16A 

32M+Z8A 

20 


12M+5A +^L 

1 BM+1 1A + *£L 

20M+20A 

36M+32A 

45M+33A+20L 

61M+29A+20L 

* 


16M+16A 

28M+25A 

25 


12M+5A +^L 

18M+1 1 A + ^L 

20M+20A 

36M432A 

45M+38A+25L 

61M+29A+25L 

* 

* 

16M+1.6A 

28M+25A 

30 


1 2M+5A +^L 

18M+11A+|l 

20M+20A 

36M+32A 

45M+43A+30L 

57M+26A+30L 

* 

* 

16M+16A 

28M+25A 


Convergence is too slow for any practical application. 
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IV . Comparison and Testing of Algorithms 


Of the five authors mentioned above, only two have actually published 
algorithms accompanying their analyses. They appear in this journal as 
Algorithms 404 and 421, both coded in ANSI FORTRAN. In this section we 
mainly attend to the comparison of these two algorithms. As indicated in the 
last section, there is much theoretical evidence to believe in the superiority 
of A421 over A404. This conclusion will be further substantiated here by 
empirical data. Before such quantitative results are presented, it may be 
useful to render a few qualitative remarks about each algorithm. 

1. Comments on Algorithm 404 

(i) The overall algorithm is easy to follow, with sufficient comments 
at strategic places to indicate the different blocks of actions to be 
executed. However, throughout the program, we can find obvious 
instances of inefficient coding. For example, near statement 70,^log{2n 
is actually computed via a call to the logarithm and near statement 
100, A = CMPLX{FXjOAT(I _ 1 ), 0. ) is realized by a statement that causes 
an unnecessary floating multiplication. Another instance of serious 
inefficiency is the generous but unnecessary usage of the complex 
logarithm. As mentioned in the last section, the complex logarithm 
is used n times for recursion. In addition, near statement 120 
the logarithmic version of the reflection formula is used, followed by a 
complex exponential. The last process should be reversed, thereby 
saving a call to the complex logarithm. 

The treatment of the function near the poles is somewhat mysterious 
and misleading. First it was given some remarks on an empirical 
relation between the number of significant figures obtained by 
Stirling's series and the distance from z to the nearest pole Zq, say, 
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6 = z ~ z q - These remarks are irrelevant to the algorithm because 
the use of Stirling's series is confined to the right half of the complex 
plane which does not contain any pole. In fact, the poles appear through 
the term sinrrz. When z is too close to a pole, i. e. , 6 < TOL = limit of 
precision of a computer system in question, the result l/TOL is 
returned. This result can be misleading because TOL is not given 
as an exact machine constant in the algorithm and the result given 
deviates substantially from the true function value which should be 
approximated by 1 /{TOL*z q !). In fact a better approach would have 
been a test on whether z is exactly a negative integer and a return 
for just that case. 

(iii) Numerical data are not in general given in the most efficient form. 

For example, the data vector C(I) could have been stored as 
floating point constants in DATA statements, with the rational forms 
given in comments for conversion to other computers. This 
arrangement saves 11 divisions. AJ.1 these constants can then be 
put together with PI, TOL and IOUT as machine -dependent numbers. 

The change of these 11 clearly identified constants requires trivial 
effort. 

(iv) It is more desirable to have the function subprogram in the form of 
a subroutine so as to include a call to log-gamma. There are two 
reasons for such desire. First, the use of log-gamma allows 
application in a much larger portion of the complex plane. Second, 
the algorithm first computes log-gamma and then takes the 
complex exponential to obtain the result. It would be inefficient for a 
user who desires log- gamma to compute the logarithm of a result 
which is the exponential of the desired answer. 
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(v) A cursory examination of the documentation of this algorithm reveals 
that the authors' testing is far from adequate. The test of the 
reflection formula, which is also used in the algorithm, provides 
very little information about its reliability. In fact, for all values 

of z outside the strip 0 < Real(z) £ 1, this test shows no more than 
the proper incorporation of the formula in the subprogram. The 
testing of the algorithm against n! is likewise deficient. One 
wonders why it was not even tested again T(x) which is provided on 
the IBM 360 Fortran library. The comparison of the algorithm 
against tabulated values may be more thorough, but the authors do 
not provide information on how extensive this was done. Did they 
compare 10 or 100 or 1000 cases? Such inadequate information 
about the testing of this algorithm raises serious quesions about 
its reliability. 

(vi) It would be useful to include in comments a list of external references. 

2. Comments on Algorithm 421 

(i) The overall algorithm is meticulously coded to yield utmost 
efficiency and accuracy. For example a quantity squared is 
accomplished by a multiplication rather than a call to the exponen- 
tiation; such sequence as {(z-fk) /(z+n) } is not computed through 
straightforward addition so that round-off error accumulation is 
minimized. In short, it is a striking example of superb coding. 

(ii) There are sufficient comments for a reader to follow through the 
code. However, there is a lack of identification of the constants 
stored in data. For example, it is- not at all obvious that HL2P is 
£n \f It would be helpful if a group of logical flags (LFO, LF1, 
etc.) be more explicitly identified. 
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(iii) Like A404 there is no list of external references. 

(iv) The answer returned by the algorithm when the argument is too 
close to a singularity should be fi, instead of Q+iCh (0 is the largest 
floating point number representable in the machine. ) 

3. Accuracy Tests on Algorithm 4Z1 

This algorithm is compiled and executed on a UNIVAC 1108 computer, 
with minor changes of the machine-dependent constants EPS, OMEGA and 
DEO. Three stages of testing are carried out, in increasing degree of 
intensity. 

(i) Blunder check - 40 sets of complex results are computed from the 
algorithm and compared with a published numerical table on Pp. 277- 
287 [ 7 ]. These results all agree to the last significant figure given 
by the table. 

(ii) Identity check - The region -30 ^ Re z ^ 30 and -30 £ Imz ^ 30 are 
divided into 2000 strips parallel to the imaginary axis. For a 
uniformly random argument z in each strip £nr(z), jtnr(zbi) and £nr(2z) 
are computed and tested against the duplication formula (4). This 
procedure is repeated for 2000 strips parallel to the real axis. The 
maximum absolute discrepancy from this identity is 1. 3D- 15 for log- 
gamma, which is consistent with the magnitudes of error reported 

by the author of this algorithm. 

(iii) Automatic tabular comparison - For the sake of more thorough 
testing, we have constructed a reference subprogram QPCGAM 
which computes the complex gamma function in extended precision 
using a package of subroutines in 7 0 -bit (about 21 decimal) arithmetic, 
composed by C. L. Lawson and associates at the Jet Propulsion 
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Laboratory. This reference subprogram is based on a Stirling's 

approximation with 9 terms, having a truncation error less than 
-19 

2x10 . QPCGAM is itself subjected to the kind of testing des- 

cribed in (ii) above and yields a maximum absolute discrepancy less 
-18 

than 10 for log-gamma* Algorithm 421 is then compared against 
QPCGAM for 7 rectangular regions of the complex plane* For 
comparison each region is divided into 500 strips parallel to the 
imaginary axis and for each strip a random argument is chosen, 
making a total of 500 test arguments for each region. For log- 
gamma this process is repeated for strips parallel to the real axis. 
The results of the comparison are summarized in Tables 4 and 5 
where ''error 1 ' means the difference between A421 and QPCGAM- 

Performance statistics recorded here render empirical confirmation 
to our qualitative remarks made in the last section. For example, 
we see that the absolute error for log-gamma indeed serves as an 
upper bound to the relative error. We also observe that all the 
precautionary measures taken by Kuki to alleviate cancellation 
error and serious accumulation of round-off error are functioning 
properly. The performance statistics found here are consistent 
with those reported by him, except for the fact the errors found 
by us are uniformly smaller than those by him. The last fact 
can be readily understood in terms of the smaller truncation error 
for long precision arithmetic on the UNIVAC 1108 computer than 
that on the IBM 360 O/S. All in all, our intensive and extensive 
testing has provided us much confidence in the reliability of this 
algorithm. 
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TABLE 4 


Relative Errors for Gamma Function by A421 


Interval of Re z Interval of Im z Max. Relative Error RMS Relative Error 


[o, 15] 

[0, 15] 

CO, 15] 

[15, 30] 

[15, 30] 

[0, 15] 

[15, 30] 

[15, 30] 

[-30,0] 

[0, 30] 

[-30,0] 

[-30, 0] 

[0, 30] 

[-30, 0] 


8. 3D- 17 

1. 2D-17 

2. 2D- 1 6 

3. 2D- 1 7 

1. 0D-16 

1. 7D-17 

1. 7D- 1 6 

3. 2D- 17 

1. 8D-16 

2. 7D-17 

2. 8D-16 

4. 7D-17 

2. 5D-16 

4. 5D-17 
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TABLE 5 


Absolute and Relative Errors for Log-Gamma Function by A421 


(a) Random Arguments in Strips Parallel to Imaginary Axis 


Interval of 
Re z 

Interval of 
Im z 

Maximum 
Relative Error 

RMS 

Relative Error 

Maximum 
Absolute Error 

RMS 

Absolute Error 

CO, 15] 

[o, 15] 

2. 4D-17 

2. ID-18 

1. ID- 16 

2. 4D-17 

[0, 15] 

[ 15, 30] 

4. 2D- 18 

8. 7D-19 

2.7D-16 

6. OD-17 

[15, 30] 

[0, 15] 

3. 4D-18 

7. ID-19 

2, 5D-16 

3. 9D-17 

1 15, 30] 

[15, 30] 

3. 8D-18 

8. 0D-19 

3. ID-16 

6. 5D-17 

[-30, 0] 

[0, 30] 

4. 2D- 1 8 

7. 9D-19 

5. 8D-16 

7. 2D- 1 7 

[-30, 0] 

[-30, 0] 

4. 0D-18 

8. 3D- 19 

4. 0D-16 

7. 8D-17 

[0, 30] 

[-30, 0] 

9. 2D- 1 8 

1. ID- 1 8 

3. OD-16 

5. 6D-17 



(b) 

Random Arguments 

in Strips Parallel to Real Axis 


C 0, 15] 

[0, 15] 

7. 8D-17 

5. ID-18 

1.4D-16 

2. 4D-17 

[0, 15] 

C 15, 30] 

3. 5D-18 

7. 5D-19 

1. 9D-16 

4. 0D-17 

[15, 30] 

[0, 15] 

4. 5D- 18 

9. 3D- 19 

3. ID-16 

6. 2D- 17 

[15, 30] 

[ 15, 30] 

4. 5D-18 

8. ID- 19 

3. 3D- 1 6 

6. 8D-17 

[-30, 0] 

[0, 30] 

6. 4D-18 

9. 2D- 19 

3. 0D-16 

5. 7D-17 

[ -30, 0] 

[ -30, o] 

4. 3D- 18 

7. 8D-19 

5. 3D- 16 

7. 8D-17 

[ 0, 30] 

[-30, 0] 

8. 6D-18 

9. 0D-19 

4. 2D- 1 6 

7. 5D-17 
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4. Accuracy Tests on Algorithm 404 


A404 is compiled and executed on a UNIVAC 1108 computer, with 
minor changes of the machine -dependent constants IOUT, PI and TOL. 
Since this algorithm is written for short precision, it may be compared 
against A421 the validity and reliability of which have been established. 
Automatic tabular comparison as described above yields the performance 
statistics in Table 6. For Re z ^ 0, our results are consistent with those 
reported by Lucas. 


5. Timing Tests on Algorithms 404 and 421 

In a time-sharing operating system precise timing of a computer 
program is not too meaningful, because such timing is dependent on the 
transitory operating environment. For this reason we have conducted a set 
of relative timing tests at different times in a two - day period. The 
average so obtained should serve as a good indication of the efficiency of 
the program tested. As an additional aid to the interpretation of the 
timing tests, the double precision exponential function DEXP is tested 
along with each algorithm so that the efficiency can also be expressed in 
terms of units of DEXP. Thus for each set of tests all three programs are 
executed in the same computer run, for 1000 test arguments selected in 
the same way as described in the accuracy tests, with proper alternation 
between vertical and horizontal strips. The statistics are reported in 
Table 7. These results confirm our earlier remarks on the superiority 
in efficiency of A421 over A404. In fact, it is significant that A421 
yields almost 3 times the precision as A404 and is still better than A404 in 
efficiency. 
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TABLE 6 


Relative Errors for Gamma Function by A404 


(a) Random Arguments in Strips Parallel to Imaginary Axis 


Interval of Re z 


Interval of Imz Max. Relative Error RMS Relative Error 


[o, 15] 

[0, 15] 

1. 4E-6 

2. 3E-7 

[0, 15] 

[ 15, 30] 

3. 8E- 6 

6. 9E-7 

[ 15, 30] 

[ 0, 15] 

2. IE- 6 

4. 4E-7 

[15, 30] 

[15, 30] 

3. 2E-6 

5. 4E-7 

[■15, 0 ] 

[0, 15] 

7. 8E-6 

1. IE-6 

[-15, 0] 

C - 15, 0] 

6. 9E-6 

1. IE-6 

[0, 15] 

[-15, 0] 

2. IE-6 

3. 7E-7 



(b) Random Arguments 

in Strips Parallel to 

Real Axis 

[0, 15] 

[o, 15] 

1. 3E-6 

2.4E-7 

[0, 15] 

[15, 30] 

4. 4E-6 

7. 4E-7 

[15, 30] 

[0, 15] 

2. 5E-6 

4. 6E-7 

T15, 30] 

[ 15, 30] 

2. 9E-6 

5. IE-7 

[-15, 0] 

[0, 15] 

5. 6E-6 

1. IE-6 

1 1 

1 

►— 1 
Ln 

o 

i_~i 

[-15, 0] 

1. IE-5 

1. 2E-6 

[0, 15] 

[-15, 0] 

1. 0E-6 

2. IE-7 


32 


JPL Technical Memorandum 33-686 



TABLE 7 


Average Timing for A421 and A404 
With DEXP as a Reference Program 


Tests 

1 

2 

3 

4 

5 

Average 


A404 

2. 85 msec 
2. 38 

2. 34 

3 , 01 

2. 99 


2.71 


A421 

1.85 msec 
1. 79 

1. 15 

2. 37 
1. 87 


1. 93 


DEXP 

0. 19 msec 
0. 18 
0. 18 
0. 19 
0. 19 

0. 19 
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V. Conclusion 

We have investigated five suggested approaches for the computation 
of the complex gamma function. Our comparison, which is most concerned 
with accuracy and efficiency, leads us to conclude that Cody's approximation 
is best for this function of imaginary arguments and a combination of Kuki's 
and Spira's analyses renders the best method for this function of general 
complex arguments. Furthermore, this comparison also demonstrates 
that Kuji's meticulous rearrangement of mathematical formulae and 
precautionary steps in detail implementation lead to his high-quality 
algorithm. Therefore we recommend without reservation that thisJ 
algorithm be used where appropriate, either in individual application or 
in program libraries. 
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